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ABSTRACT 


This thesis examined whether American English speech recognition technology 
can be used by Chinese speakers, in their native tongue, to achieve a reasonable degree 
of recognition accuracy. Three experiments were completed. The first showed that 
88.25% of 4305 trials of Chinese phoneme recognition was correctly recognized. The 
second showed that 74.67% of 900 trials of simulated speaker independent mode 
Chinese utterance recognition was correctly recognized. The third showed that 12.44% 
of 900 trials of speaker dependent mode Chinese utterance recognition was incorrectly 
recognized on the first attempt. Only 16 utterances required a retraining to eventually 


obtain a correct recognition. 
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I. BACKGROUND 


A. INTRODUCTION 

There were about forty speech recognition/input studies conducted at United 
States Naval Postgraduate School (NPS) during the past six years. The conclusion 
suggested by these studies is quite significant: that speech input, compared to 
conventional manual input, is much more accurate and faster. And, since hands are 
free from typing on the keyboard, users may be capable of performing a secondary 
assignment. From an early experiment conducted by Prof. Gary Poock in 1980, he 
concluded three results. (1) Manual input had 183.2% more entry errors. (2) Speech 
input was 17.5% faster. (3) Speech input allowed subjects to concurrently perform 
25% more on a secondary job. See [Ref. 1] for detailed information. 

Another highly valued finding is that speech input needs only a small amount of 
time to acquaint brand-new users with this input device, and results in a better 
performance than that of a well-trained operator who uses a keyboard as an input 
device. From the same experiment mentioned above, Prof. Poock found that the 
average time for the subjects to practice with the voice recognition equipment and feel 
ready to conduct the experiment was only 3.26 hours. This is much less than the time 
needed for familiarizing an individual with a keyboard device. 

The usage of English speech to input data to computer systems has proved to be 
technically and practically feasible. At the same time the range of potential military 
and commercial applications of this medium appears extensive. All of these 
encouraged the author to initiate this studv and, hopefully, to provide some useful 
information for further research and future possible applications of Chinese speech 


recognition/input. 


B. THE LANGUAGE AND THE RECOGNITION 

The language used in most studies mentioned above was English. There was, in 
fact, only one experiment that examined a second language- German. As described in 
[Ref. 2], the recognition system functioned equally well when training and testing used 


German as an input language. The same study also examined the capability of the 


recognition system (Threshold Technology T600 voice recognition system) to function 
in a bilingual mode. However, significant degradation was observed when training and 
testing was bilingual in nature. | 

During one of his Man-Machine Interface laboratory projects, the author, under 
a programmed scenario, has successfully operated the DDN with Chinese speech. The 
DDW stands for Defense Data Network, a large distributed network of computers 
which are geographically located around the United States and other countries. From 
that preliminary experiment, the author has shown that Chinese speech can also be an 


effective input medium for command/control opertions. 


C. THE PURPOSE AND THE SCOPE 

Because of the imperfect phonetic system, Chinese speech has suffered a certain 
degree of difficulty. Due to the same reason, some confusion about the phonetic 
system has been raised during the past years. Although the difficulty itself will not 
influence the recognition of Chinese speech, the reasons that caused the difficulty will. 
In addition, all that confusion, if not clarified, will be the trouble area for Chinese 
speech recognition in the future. 

The main effort of this study is, then, to do a thorough study on Chinese speech 
and the corresponding phonetic system. A brief discussion is provided in Chapter ITT. 
The detailed discussion, provided in Chapter II, on the English part is mainly for 
establishing a reference basis for the later discussions of Chinese speech. A further 
experiment on examining Chinese speech recognition was conducted. The description 
of the experiment itself and the results obtained are provided in Chapter IV and V 


respectively. Some suggestions on further studies are also discussed in Chapter V. 


D. GENERAL INFORMATION ON THE STUDY 

The studies on the two languages within this thesis focused on the sounds of the 
languages. Hence, it is necessary to point out English used here means American 
English while the Chinese means Mandarin Chinese. The presentation of the speech 
sounds during the discussion will be some selected letters quoted by special characters. 
To differentiate them, the author uses /..../ to present English pronunciations and 


<....> to present Chinese pronunciations. 


The English phonetic system the author used is known as the KK Phonetic 
System established by two famous American linguists- Dr. John S. Kenyon and Dr. 
Thomas A. Knott. Their A Pronunciation Dictionary Of American English has been an 
international reference book for studying American English. The Chinese phonetic 
system the author used is the only system compiled by the Chinese Department of 
Education in 1918. The system is also Known as <Droo In Foo Hao> in Chinese. 
Consult [Ref. 3] and [Ref: 4] for detailed description. 

The KK System was so well established that it fully complied with the rule of 
thumb for constructing a phonetic alphabet system: One symbol represents only one 
unique sound, and one sound only has one unique symbol on its behalf. However, this 1s 
not the case in the Chinese phonetic system. There are symbols representing two or 
even three sounds, or two symbols actually representing the same sound. This is an 
important feature deserving special attention for those who want to apply the current 
Chinese phonetic svstem in Chinese speech recognition/input research. Further 


discussion will be provided in Chapter III. 


It 


Hl. AN EXAMINATION OF ENGLISH SPEECH 


A. THE SOUNDS OF ENGLISH 

According to the KK System, there are forty-one sounds used in English, which 
are called phonemes of English. Among them, seventeen are vowels and twenty-four 
are consonants. These forty-one sounds, depending on the way they are produced, 
have been sorted into ten groups. Each sound is associated with a unique phonetic 
alphabet formulated by the International Phonetic Association. (Consult Appendix A 
for more information on the original symbols used.) However, these phonetic 
alphabets are usually used only by linguists and therefore just several of them can be 
found on the NPS IBM 3800-3 printer system. For easing our discussion, the author 
constructed a svmbol system to represent these forty-one sounds. Please see Table | 
for the general idea. 

The phoneme is the smallest unit of significant distinctive sound. However, not 
all phonemes can form a syllable- the smallest unit of English words. To form a 
syllable, one and only one vowel sound is required as the base and may or may not be 
proceeded or followed by any consonant combinations. So /ei/, /bee/, /it/, /head/, and 
/spleen/ are all considered single syllable words. 

The most reliable way to discriminate phonemes is to first examine the manner 
and then the speech organs used to produce the speech. (Ref. 5] has provided intensive 
discussions on the production of each phoneme and can be a very helpful reference. 
Human hearing is a good enough tool to tell the differences among sounds, but it is 
not always reliable in trying to differentiate certain similar sound pairs such as /ee/ and 
/1/, /oo/ and /o/, or /n/ and /ng/. We can use [Ref. 6] as a valuable source to obtain 
detailed information on those sound pairs. 

Certain sounds may be recognized on one speech recognition system but not on 
another system. This is due to the algorithm design adopted by the recognizer 
manufacturers. Although it is beyond the scope of this study, it 1s proper to note that 
the algorithm of the recognizer has a dominant influence on the recognition 


performance. 


TABLE 1 
PHONEMES OF AMERICAN SPEECH 


Front Vowels (FY): Back Vowels (BV): 


Central Vowels (CV): 


letter * 


athe 
eee: SIX | 


Glides (GL): 
year 

2.W walt 

Som... 78 right * 


Affricates (AF): 


Lateral (LA): 
fet lerrts:.< lay 


* sounds not used in Chinese. 





B. THE PRODUCTION OF SPEECH SOUNDS 

The production of vowels is primarily done by adjusting the shape and size of the 
oral cavitv, the main resonance chamber. Such adjustments are made by altering the 
position of the tongue, jaw and lips. The vocal tract!, during speech production, 
remains relatively open and unobstructed. The production of consonants is done by 


adopting certain articulatory motions to produce different types of sounds. Therefore, 


‘Vocal tract is the area through which the breath stream passes during the 
production of the sounds. 
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we mav discuss consonants by examining the place” of articulation and the manner? of 
articulation used to produce the sounds. During consonant production, some kind of 
obstruction of the vocal tract is observed. 

In Table 1, some phonetic terminologies are being used. From _ these 
terminologies, one can easily obtain some information about the production of each 
category of English speech. Here is a brief introduction to these terminologies. More 
detailed information can be found in [Ref. 5.] 

Front Vowel is a vowel which is pronounced with the front part of the tongue 
higher than the rest of the tongue. Front Vowel is also called Spread Vowel because it 
is also pronounced with the lips spread. Back Vowel is a vowel which 1s pronounced 
With the back part of the tongue higher than rest of the tongue. Back Vowel ts also 
called Rounded Vowel because, of course, it is pronounced with the lips rounded. 
Central Vowel, then, is a vowel which is pronounced with the middle part of the 
tongue higher than the front or back of the tongue. The shape of the lips for Central 
Vowels is, aS you can imagine, somewhat between spread and rounded. 

All three categories of vowels mentioned above are considered single vowels. 
Diphthongs are sounds that appear to be formed from the blend of two single vowels 
spoken together in the same syllable. What actually happens here is that the 
articulator begins the syllable in the position for one vowel and then shifts with a 
smooth and continuous transition movement toward the position for some other vowel. 
One can easily learn to detect the first and second vowels of the diphthongs. 

Fricative is a consonant-consisting acoustically of friction noises. They are made 
by directing the breath stream with adequate pressure against one or more points of 
articulation and lead to the hissing noises of distinctive Fricatives. Stop is a speech 
sound which involves a complete blocking of the breath stream at some point and is 
subsequently released with a somewhat audible explosive puff. That is why Stop is 
sometimes also called Explosive. Nasal is chosen for the class because of the 
distinctive nasal resonance that those sounds uniquely contain. Glide is a consonant 
that consists primarily of the movement of an articulator which causes a rapid change 
of resonance. Glide is also called Semivowel, because the starting position of 


pronouncing each of them is a vowel. They are /ee/ for /y/, /oo/ for /w/ and /ur/ for /r/. 


-Place of articulation includes bilabial, labiodental, linguadental, lingua-alveolar, 
linguapalatal, linguavelar and glottal. 


3Mfanner of articulation includes nasal, stop, fricative, affricate, lateral and glide. 
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Usually the tongue moves from the position of each vowel to that for the following 
vowel in the same syllable. The sounds produced by the articulator movement between 
the two vowels are represented by each Glide respectively. Affricate is a consonant 
that is made up of two consonants- a Stop followed by a Fricative. Lateral is 
produced in a manner that the voiced breath stream escapes laterally over the sides of 


the tongue. 


C. THE PITCH AND INTONATION OF ENGLISH 

When you read an English word or a sentence composed of several words, your 
sound flow actually contains different pitches. Although each word has its unique 
pitch pattern in English, it has some variations when the same word is read with other 
words in a sentence. We use intonation as a term for the latter concept. 

English has been described as using four pitch levels. Thev are extra-high, high, 
mid, and low. To simplifv, numbers have been used to designate them. George L. 
Trager and Henry L. Smith, Jr., in their dn Outline of English Structure, chose 1 to 
represent low. As the pitch level rises, the representation also increases in number. In 
normal speech, however, extra-high designated by 4 does not occur often. Extra-high 
usually indicates excitement. 

Since pitch is determined bv the frequency of the sound, the pitch level is, from 
the ‘viewpoint of linguists, really a relative matter. There is no need to tell the 
difference between the pitches of the same syllable produced by two persons. Similarly, 
the attempt to tell the difference between the pitches of the same syllable produced by 
the same person at different moments is also meaningless. However, there are indeed 
certain rules regarding pitch which must be observed in order to generate 


understandable English words. These rules are as follow: 


1. The principal stressed syllable of a word will be pronounced with 
high pitch (designated by 3). 


2. All the syllables produced before the principal stressed syllable 
will be pronounced with mid pitch (designated by 2). 


3. All the syllables produced after the principal stressed svllable 
will be pronounced with low pitch (designated by 1). 


4. When the principal stressed syllable is the last syllable of a 
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word, the vowel sound of the syllable will present a 3-to-1 
falling inflection of pitch. 


5. An auxilliary stressed syllable will act similar to a principal one 
and the only difference is that its pitch level will be located 


between high and mid. 


Some examples are provided in Figure 2.1, which apply those rules mentioned 
above. Again, one should keep in mind that the pitch relationship among syllables of a 
word is relative. As you can see, the first three examples are presented with an order 
that the principal stressed syllable appears at first, second, and then the last syllable of 
each word respectively. The last one is an example of a single-syllable word that will 
be pronounced like the last syllable of the third example. When a word with an 
auxilliary stressed syllable is encountered, you just insert that syllable into a level 
between 3 and 2, and pronounce it with a pitch higher than the mid pitch syllable but 


lower than the high pitch syllable of the word. 


pe 
pros 


chigan 





Figure 2.1 Examples of English Pitch Patterns. 


The 3-level pitch system can also be applied in discussing intonation, where the 
whole sentence is put into a pitch frame having a wider frequency range for each level. 
To obtain the idea, see examples in Figure 2.2. 

The first example represents the most common and colorless intonation pattern 
in English, which is designated with number 231. Simple statements and questions 
Starting with question words always use this pattern. The second intonation pattern is 
used by what we called ‘yes/no questions’, and is designated with number 233. The last 
one is an example to show a simple statement colored by extra meaning, and is 


designated with number 223. Interested readers may consult [Ref 7] for a complete 


men 
evel 


S Mi Michigan? chigan? 


2 Heis from chi Is he from He is from Mi 


l gan. 





Figure 2.2 Examples of English sentences intonation. 


discussion on this subject. The main point the author wants to address here is that the 
pitch pattern of an English word may change depending on how/where it appears 


within a sentence. 
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Ii. AN EXAMINATION OF CHINESE SPEECH 


A. THE SOUNDS OF CHINESE 

The original Chinese phonetic system had 41 symbols. However, the current 
system used onlv has 37 symbols. Four symbols were deleted. Two of them, exist in 
English as well, represent the sounds /ng/ and /v/. The reason for the deletions, 
however, was different. The symbol representing the sound /ng/ was deleted because 
the system had another symbol also representing the sound. The latter was simply 
because the Chinese does not have the speech sound /v/. The third one was a Nasal 
sound produced with tongue-front pushed against the hard palate, which does not exist 
in either English or Chinese. The fourth symbol, representing two very similar Front 
Vowels of Chinese , was deleted for, probably, the following two reasons. First, they 
are not able, as other finals, to form a syllable by themselves. They must follow a 
particular Fricative. Second, the articulation places of the two sounds are the same as 
that of the Fricatives which proceed them. This deletion causes Chinese characters to 
sound sometimes as being represented by a single consonant. As a remedy, the author 
uses <ih> to represent the two sounds and which will be shown as the 38th symbol of 
Table 2. 

Given the historical information mentioned above, the author constructed a 
38-symbol table for the Chinese phonetic system, which actually can be seen as a 
romanization system. Appendix B has provided a table that simultaneously presented 
several current existing romanization systems, namely, Yale(YL), Wade-Giles( WG); 
Chinese Phonetic System Second Form(SF), PinYin(PY), and the system suggested by 
author (SG), for purposes of cross reference. The order of the symbols in Table 2 1s 
exatly the same as that of the existing phonetic system. The first 21 symbols are 
consonants, also called initials, and the succeeding 17 symbols are vowels or 
combinations of a vowel and a Nasal, also called finals. The reason for the alias is due 
to the features of Chinese pronunciation. Chinese characters are always single-syllable 
sounds. They usually are an initial followed by a final, an initial and a Glide then 
followed by a final or just a final itself. In most situations, the characters end with a 


vowel sound. The only two consonants allowed to be produced at the end of a 


character are sounds /n/ and /ng/. Although /n/ is also one of the 21 initials, the 
Chinese phonetic system has another symbol to represent the /n/ sound that appears at 
the end of a character. Hence, those 21 consonants will always be the initial part of a 


character sound. 


TABLE 2 
CHENESE PHONETIC SYSTEM 


Initials: 
Bilabials '  Glottal 
leabeees 1 licsheewat 
a Ee Lingua-palatals 
Labiodental 12. }.....AF 
Pomel. AP 
eh eect [aos 
Lingua-alveolars foci -Ar 
16 ish.. ms 
a dae | 17. sh....FR * 
Grito Pye.” 
Titec NAN : 
Sah. LA Lingua-alveolars 
— im az....Ar.+ 
DOs tS Abie 
Set Set t 
16. 37 
Finals: 
Single Vowels Combinations 
2 deers vot 30. an * 
Weay [6 pepe) oN Sens 
4. e.....CV 3 2@ane 
Wace aN Bo) cnoeas 
Diphthongs Single Vowels 
20, ate) 34, Clea 
PieuC iawn, Dab leieeslew ct 
Por don 36. 00....BV + 
29808... BN. Spall ve 
Somer \ 


* sounds not used in English/further disscussion provided 


+ further discussion provided 


A quick look Table 2 shows that the consonants of the Chinese phonetic system 
are grouped by the articulation organs used to make each sound. The reason for this 


was mentioned by Prof. Francis Dow in his work [Ref. 8: p.24], and quoted below: 


1. The consonants of each category have their homorganic nature 
in articulation. 

2. In the constitution of syllables, certain sets of initials occur 
before certain sets of finals (consult Table 5). 

3. It is more convenient to compare the consonants of each category 


with those in other Chinese dialects. 


In Table 2, twenty symbols followed by neither ‘+’ nor ™’ are sounds also used 
in English. The author selected exactly the same symbols shown in Table 1 to 
represent them respectively. Seven symbols followed by a ’+’ are sounds also used in 
English, but some details need to be clarified. Eleven Symbols followed by ‘*’ are 
sounds not used in English; therefore, a brief introduction is provided for each of them. 


The following two sections provide detailed discussions on this. 


B. CLARIFICATION OF CONFUSIONS IN THE CHINESE PHONETIC 
SYSTEM 


The 20th and 19th symbols represent a pair* of affricates. Sound <ts> is 
voiceless and <dz> is the voiced counterpart of <ts>. They appear, in English, at 
the end of the plural form of nouns with ending sound /t/ or /d/ respectively such as 
hats and hands. | 

The 22nd symbole 2a" represents three different sounds. All of them are used 
in English, but only two are considered phonemes. The first sound is /a/ of car and the 
second sound is /u/-of cut. The third one is the first half of diphthongs /ai/ and /ow/; 
however, it is, in Chinese, the most frequently used sound among the three. The 
author suggests using <aa> to represent this, since the lips, when producing the 
sound, are spread wider than when producing sound /a/. And the symbols for the 


remaining two sounds are, as their English counterparts, <a> and <u>. 


4Two sounds are considered a pair when they adopt the same method and use 
the same articulator and point of articulation for pronunciation. The only difference is 
that one is voiceless and the other its voiced sound. 
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The symbol <a> has already caused an unrecoverable damage in Chinese. No 
one, at present time, is able to tell, when encountering a character with symbol <a>, 
which one of the three sounds should be used. Words in Chinese such as mother, 
<ma ma>, lama, <la ma>, or to punch a card, <da ka>, should actually be 
pronounced, from the author’s limited-scale investigation, as <mu mu>, <laa mu> 
and <daa ka>. Since the situation is messed up already, no one ever has the 
authority to say which one of the three sounds should be the right sound for certain 
characters. A further wide-range investigation is needed if one 1s really anxious to use 
the right sound for characters with symbol <a>. And, probably, the end product of 
the investigation would only be the majority-used sounds of the general population in 
this age. However, since it is beyond the scope of this thesis, the author leaves the 
problem to future researchers. For the purpose of simplifying the following discussion, 
the author will, from now on, use only <a> to represent the three sounds. 

Both the 3lst and the 33rd symbols represent two different sounds. Thev 
represent sounds /n/ and /ng/ respectively in some cases and /e/ followed by /n/ or by 
/ng/ in some other cases. Although many people are confused by these two symbols, a 
careful study certainly helps to differentiate the usages of them. Symbol <en>, in 
most situations, represents sound /e+n/, except when appearing after the symbols 
<i> and <iu>. In the latter case, the <en> represents sound /n/. Symbol 
<eng>, the same as <en>, represents the sound /e+ng/ most of the time, but when 
appearing after symbol <i>, <oo> or <iu>, it represents sound /ng/ as well. See 
Table 6 for some examples. 

Again, the 35th symbol, <1>, represents three sounds which are also used in 
English. They are /1/ and j/ee/ of Front Vowels and /y/ of Glides. To tell when <i> 
representing sound /y/ is easy, because once one notes an <i> appearing before a 
final, he is almost sure that the symbol <i> represents sound /y/. However, the finals 
<en> and <eng> are two exceptions. In this situation, the symbol <i> represents 
sound /1/ or /ee/; with the two finals becoming consonants /n/ and /ng/. 

In the case of telling whether /i/ or /ee/ is represented by symbol <i> for a 
certain character, one faces the same problem discussed earlier. [t is again an 
unrecoverable damage which was caused many years ago. Secret, as an example, in 
Chinese symboled by <mi mi> should in fact be pronounced as <mee mi>. The 
author, for the same reason, leaves the problem to researchers for further study and, 


uses the svmbol <i> to represent the two sounds through the following discussions. 


Although the 36th symbol represents two sounds also used in English, we can 
easily tell them apart by examing the usages of the symbol. The two sounds 
represented by the symbol are /oo/ of Back Vowels and /w/ of Glides. Once an < 00> 
is found before a final, for most situations, we know that it is sound /w/. However, the 
final <eng> 1s the only exception. In this case and in the case that the <oo> itself 


is the final part of a character, we know that the sound /oo/ is represented. 


TABLE 3 | 
SUPPLEMENTARY TO THE CHINESE PHONETIC SYSTEM 


NO — Original Suggested Articulation 
20 <2 <a> BY 
aay CV 
<aa> Gy 


<en> <en> CV+NA 
NA 


<n> 


<eng> CV+NA 
<ng> NA 


<1> FV 
<ee> FV 
ye GL 


<o0o> BV 
<> GL 


<iu> EV 
<yw> GL 





Table 3 provides a summary of this section, which lists all the symbols that are 
easily confused. The first column of the table is the number of each symbol, which 
corresponds to the number appearing in Table 2. The second column lists ali the 
symbols, except 19 and 20, discussed in this section. Symbol 19 and 20 are not 
included because they are not confused at all. Svmbol 37 is hsted here too, but the 
discussion is provided in the next section, because it is a sound existing uniquely in 
Chinese. The third column is the author's suggestions that each symbol should 
actually be according to the discussions provided in this chapter. The last column 
provides articulation information on each symbol. Consult Table | and the discussions 


provided in Chapter II for a better understanding of the abbreviations used here. 


C. INTRODUCTION TO UNIQUELY EXISTING SOUNDS IN CHINESE 

The 14th sound of the Chinese phonetic system, <hs>, is a Fricative. To 
produce the sound, one needs to raise his or her tongue-front toward, but does not 
touch the hard palate, and let the tongue-tip stretch down against the lower teeth 
ridge. With the tongue held in this position, an unvoiced breath stream is directed 
against the hard palate, lower teeth ridge and teeth to produce the sound <hs>. 

The 17th and 18th symbols, <sh> and <r>, represent a pair of Fricatives also. 
These two sounds do not appear in English, but they have some similarities to the 
sound pair /sh/ and /ge/ in English. The author directly ‘borrowed’ the symbols from 
English for reasons mentioned below. 

The only difference between /sh/ and <sh> is the articulators used by the two 
sounds. The /sh/ sound requires raising the tongue-mid toward the hard palate; while 
the <sh> sound uses tongue-front to stretch toward the hard palate. Everything else 
is the same. 

Just as /ge/ to the /sh/, the sound <r> is the voiced counterpart of the <sh>. 
The reasons we do not use <ge> is that /r/ also has some similarities to the sound 
<r> and /r/ appears more as an initial which is exactly the characteristic that <r> 
has. The way to produce the sound <r> is the same as producing <sh> except 
adding the vibration of the vocal cords, which is the main feature of voiced sounds. 

The 16th and 15th symbols, <tsh> and <dr>, are a pair of Affricates. Their 
relationship with the sound pair of <sh> and <r> is just like that of /ch/ and /}/ to 
the sound pair /sh/ and /ge/ in English. That is why the author selected <t + sh> 
and <d + r> to represent the two sounds respectively. And, the way to produce the 
sound <tsh> is just as the symbol itself suggests: do a preparation action as if you 
Were going to produce a <t> sound. When ready, actually produce an <sh> sound 
instead. It is similar to producing the English sound /ch/ except using a different 
articulator. Io produce <dr> is the same as producing <tsh> except adding a 
vibration of the vocal cords since <dr> is its voiced counterpart. 

Although the 28th sound, <ao>, does not appear in English, there is indeed a 
very similar sound in English. That is /ow/. The only difference between the two 
sounds 1s the first half starting position of the sound. The sound /ow/ ts a diphthong 
formed by blending <aa> and <o> together; however, the sound <ao> is 
produced by blending <a> and <o> together. A careful examination of the lips’ 
Shape can certainly help to distinguish the two sounds, <a> and <aa>, without anv 
difficulty. 


tl 
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The 30th symbol, <an>, is actually a combination of a Central. Vowel and a 
Nasal. It does not appear in English because of the vowel part of the sound. It is 
<aa> which does not appear in single vowel form in English. However, one can find 
the sound in the first half of the diphthongs such as /ai/ and /ow/. In a similar 
manner, the 32nd symbol, <ang>, is classified as sound not existing in English for the 
same reason. 

The 34th symbol, <er>, is directly ‘borrowed’ from the English phoneme /er/. 
Although the two sounds have some similarities, they are not the same. The sound /er/ 
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is a short, lax,- mud-central, r-colored® vowel which can be produced by tongue 


7, The sound <er> is short, r-colored too, but it is a high-front and tense® 


retroflexion 
vowel. The tense is caused by keeping the tongue retroflexed and stretching the tongue 
forward to the hard palate simultaneously. 

The 37th symbol, <iu>, actually represents two sounds. One is a vowel and the 
other is a Glide whose start position of production is the vowel. The author suggests 
using <yw> to represent the said Glide. The sound <iu> is a Front Vowel but not 
a Spread Vowel. When pronouncing the sound, one must hold the tongue in the 
position of producing sound <i> and, at the same time, round the lips as if producing 
sound <oo>. The sound, hence, can be described as a lower high-front, rounded, 
tense vowel. The relationship between <iu> and <yw> is just as <ee> to <y> or 
<oo> to <wWw>. 

Although the 38th symbol, <ih> represents two different vowels, the author 
does not intend to differentiate them with two symbols. Because, first, they are very 
similar; second, the speech organs used by each of them are identical to those of a 
Fricative respectively; third, each of them can only follow a particular group of sounds 
that are formed by that same Fricative; fourth, they don’t independently exist as other 
finals. 


>Lax vowel is a. vowel which is pronounced with the muscles of the throat, 
tongue and corresponding mouth lax. 


The r-color is an acoustic effect of a simultaneously articulated ‘r’ imparted to a 
vowel by retroflexion or bunching of the tongue. 


_ /Retroflexion is the articulation with or involving the participation of the tongue 
tip raised and retracted toward the hard palate. 


“Tense vowel is a vowel which is pronounced with the muscles of the throat, 
tongue and corresponding mouth tense. 
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To produce the first sound, one needs to stretch the tongue-front toward the 
hard palate, same articulation position of producing the sound <sh>, and then 
vibrate the vocal cords and let the sound resonate in the oral cavity. This sound can 
only follow the sound <sh>, <r>, <tsh> and <dr>. Similarly, to produce the 
second sound, one needs to stretch the tongue-front toward the alveolar ridge with the 
same articulation position of producing the sound <s>, and then vibrate vocal cords 
and let the sound resonate in the oral cavitv. This sound can only follow the sound 
<s>,<ts> and <dz>. 

Since the ending position of sounds <sh>, <r>, <tsh> and <dr> 1s the 
position for producing <sh> and the articulation position of the <ih> that follow 
these sounds is also <sh>, when we produce the syllable <sh + ih>, for example, 
we actually produce the consonant first and then maintain the ‘same articulation 
position and produce the vowel. Because the ending position of sounds <s>, <ts> 
and <dz> is, similarly, the position for producing <s> and the articulation position 
of the <ih> that follow these sounds is also <s>, when we produce the syllable <s 
+ ih>, for example, we actually produce the consonant first and then maintain the 
same articulation position and produce the vowel. That is why the author intends to 
use the same svmbol to represent the two similar sounds. It is probably, as mentioned 
earlier, also the main reason why they deleted this symbol in the first place. 

Table 4 concludes the discussions provided in last two sections. The table 
provides the complete information about the Chinese speech phonemes. There are 25 
consonants in the table and 21 of them are the initials of the original phonetic system. 
The three of the remaining ones are the Glides which use the same symbol with three 
finals, namely <i>, <iu> and <oo>. And the last one is the <ng> separated 
from final sound <eng>.. There are 16 vowels in the table too. Onlv 12 of them are 
from the original system. The four symbols, <an>, <en>, <ang> and <eng>, are 
dismissed because they are simply combinations of two phonemes. The four new 
vowels in this table are <ee> separated from <1>, <u> and <aa> separated from 
<a> and sound <ih>. As their English counterpart, these phonemes are grouped 
into ten categories. The three groups of single vowels are put in an order that the 
sound produced with the highest tongue posture of each group is the first one and the 
lower the latter. The four groups of consonant, namely Nasal, Stop, Fricative and 
Affricate, are put in an order that the sound produced at the most outside of the vocal 
tract is the first one and the inner the latter. The remaining groups, however, have no 


special order at all. 
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TABLE 4 
PHONEMES OP CHINESE SPEE Gal 


Front Vowels (FY): Back Vowels (BV): 
Le eee Sc , ae oy es 
2. CC 35). Se ee 20ers 3 
Behe 34) * 5" a oes By) 
ee ( 2 A dnc ee 
Sica oor 
Orel oF Central veal (CV) 
figle Aee.cs.( 20) 

bere 33 
Diphthongs (DI): 2 eae, Z 

3. dade 

1. ai......(26) 
2. dOv ae (28) * . 
Stops (ST): ° Nasals (NA): 
be Dae 2 le Tees 3) 
2 b oe i} 2 (7 36) 
eee) 3. ng......(33) 
4. d....... y 
De Keer i ) 
O.) iter 9) 
Fricatives (FR) Affricates (AF) 

Pe 4) [atsackte 20 
Dances 121 2 eee a 
cae eee l c StSheald Gla 
eS ilveree iw ares tet foye 
Seutanenee: (18) * Sages. ct 3 
G. ieee (11) On ee::.: 2) 
Glides (GL) Lateral (LA): 
Te Vercans (62 Wal reece ( 8) - 

“fe 


* sounds not used in English. 


D. THE SOUND COMBINATIONS OF CHINESE 
We have devoted a lot of effort to studying the phonemes of Chinese speech and 
We are now ready to make a further step to examine the sounds of Chinese characters. 


As we know alreadv, Chinese characters are always single syllable and formed by an 
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initial followed by a final, an initial plus a Glide and then followed by a final or just a 
final itself. This statement now needs a minor amendment. Since Glides can also 
function as intials, we know that a Glide followed by a final can also form a Chinese 
character sound. 

Table 5 provides a matrix of Chinese character sounds formed by initials followed 
by finals. The total possible sound combinations are 374. This number is obtained bv 
multiplying 21(initials) by 17(finals, including <ih>) and then adding 17. The extra 
17 represents the character sounds formed bv only finals themselves. However, 
according to information provided in [Ref. 3: p. 30], only 220 are actually existing in 
Chinese speech. In Table 5, letter x represents those sounds that are actually existing. 
Letter c(hange) and d(elete) represent the sounds that the author has modified. In the 
author’s opinion, the sounds <bo>, <po>, <mo>, <fo> and <lo> should 
actually be <bwo>, <pwo>, <mwo>, <fwo> and <lwo> respectively. The 
letter n(ew) represents the sounds not appearing in the source table. See also 
Appendix A for the original table used, which, however, has been reformatted by the 
author for easv observation. 

Table 6 provides a matrix of character sounds formed by an initial plus a Glide 
and then followed by a final. The total possible sound combinations are 484. There 
are, among them, 22 that are actually sounds formed by a Glide followed by a final. 
From the same information source, however, there are only 190 actuallv existing. 
There are, hence, 858 total possible sound combinations and only 410 of them actually 


exist in Chinese speech. 


E. THE TONES AND INTONATIONS OF CHINESE 
Mandarin Chinese is a tone language, because it uses pitches to distinguish 


9. There are four lexical tones in Chinese. Usually they are refered to 


lexical meaning 
as tone-1 through tone-4. They are also called, in Chinese, <inl ping2> for tone-l, 
<yang2 ping2> for tone-2, <shang4d shengl> for tone-3 and <chiu4 shengl> for 
tone-4. These tones may be associated with any sound combination to form at least 
four different Chinese Characters if all of them exist. Chinese character <ma>, for 
example, associated with tone-l means ‘mother’; with tone-2 means ‘numb’; with 


tone-3 means ‘horse’; with tone-4 means ‘to scold’. 


Lexical meaning is the meaning of the base (as the word play) in a paradigm (as 
plays, playing and played). 


ay 


TABLE 5 
INITIAL + FINAL SOUND COMBINATIONS 


initials 


g 


ctr 
— 


eechienisea, tsh Shs 


pas 


MASS Ml KlUlUMU 
“~ 


X 


rm Ps 5 


mr MMMM 
AM RM KKM MK 
AP KO 
MS MK OCU CO 
a a a oo 
a a le a i a a oe 
MAM MK KOM 
Po OS 
MMS MMMM SOO” 
Pr MM MO 
PP PPS POO 
PP PS PSO OS 
MMMM MMM OO 
MAM MMH OM 
Mrs OM CUD 


a“ 
“~ 


X 
X 
Xx 
xX 
X 
xX 
X 
X 
X 
Xx 
xX 
X 
xX 
xX 
xX 
X 


OQ 
°° 
+t 


change, d for delete and n for newly add. 





There are two features deserving special attention. First, not all sound 
combinations are associated with all four tones. According to an early investigation 
described in [Ref. 9], there are only 1272 out of a total of 1640 sound-tone 
combinations actually existing in the Chinese language. Secondly, there are more than 
forty-eight thousand Chinese characters. Among these characters, 4808 are frequently 
used. In either case, a severe homonymic problem occurs. Take the sound <i> as an 


19 existing in the Chinese language. It will 


example. There are 173 Chinese homonyms 
be impossible for a recognizer to distinguish these characters. Therefore, a vocabulary 
formed by at least more than one character 1s recommended. 

According to a 5-point tone system established by Dr. Drao Ywan Ren many 
years ago, the Chinese tone can be expressed with a 5-level pitch matrix. Imagine that 
the matrix is in the first quadrant of a rectangular coordinate system. On the vertical 


axis, five points, from one to five, represent the pitch of a Chinese character from low 


10h omonyms are words/characters that are spelled and pronounced alike but are 
different in meaning. 
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VABEE.6 
INTTIAL + GLIDE+ FINAL SOUND COMBINATIONS 


initials 
G+F Dep minemecucmiel 2 Kn | chhsdrtsh shr dzts s 
va X x cox | x 
vo X 
yea x % eX MoxoK «CX Xx X xX 
val X 
vao XX we Xx X Oe. Le x 
yoa X X X X xX xox xX 
yan Ne nx wax XX Xo Xx 
eae Mk Xx eX xe ex 
vang me ox Xx xX X 
ing’ Xue XX x ek x x ok OX 
wa x Xi XM AX Yeo xX 
wo. ei iexeexeex X XK “XX MX XOX XK XxX 
Wal x Mex 'X x ox 
Wei X X xX Xx xX xX Keke XX 
wan X NeeeeN OX eax. XX x xX Uk OX xX” x 
wen X ex RaeXK aK: 7X. Rex OX Xo X¥ XX 
vam X em eX XX eX 
Cong Xx Mee x OK XX Cae Se NCE XE OX 
ywea XX nox X X X 
vwan =X X Xo ox 
fun. X x ex Xx 
lung* x Ke x 


* are sounds actually formed by (Vowel + Nasal). 
n for newly add. 


to high. On the horizontal axis, five points, again from one to five, represent the 
elapsed time unit for pronouncing the particular character. Tone-1!, then, can be 
graphed as the line connecting the points (1,5), (2,5), (3,5) and (4,4). Tone-2 is the line 
connecting the points (1,3) and (4,5). Tone-3, a little strange, is the line connecting the 
points (1,2), (2,1), (3,1), (4,1) and (5,4). Tone-4 is the line connecting the points (1,4), 
(2,3), (3,2) and (4,1). Consult [Ref. 3: p. 34] for detailed information. The appendix of 
(Ref. 7] has provided an intensive discussion on the Chinese tones from the viewpoint 
of spectrographic evidence. 

The 5-point Chinese tone system is, in this author's opinion, achieved by adding 
an extra level between level 3 and level 2 and between level 2 and level | of the English 
pitch system. A simplified version can be used to sufficiently express these tones. In 


this new version, similar to English, tone-! is given a symbol of 55; tone-2, 35; tone-3, 
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214; tone-4, 51. Two utterances, numbered 01 and 50, selected from Appendix C are 


displayed as examples in Figure 3.1 and Figure 3.2. Again, the pitch changes occur 


only at the vowel sound of each character. 


ang droong 





Mel 
evel 


— Ph Ww BL 


Figure 3.1 Examplessor Chinese Pone (atterns: 


gao shoo 00 a ji 
00 00 a 
Ooo doo a 
00 ka a 


OO a 


Figure 3.2 Examples of Chinese Tone Patterns. 


The rule for the intonation of Chinese, in a sense, is relatively simple. Basically, 


each character in a sentence remains the same tone pattern as they independently 


appear. Therefore, when combining the two examples shown in Figure 3.1 and Figure 


3.2, a complete imperative sentence is obtained, which means in English, “Abort the 


high speed reader.” However, the spectrographic evidence showed that although the 
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tone pattern was generally maintained, both the elapsed time units and pitch levels of 
the character sound passing through were both slightly shortened when appearing 


within a sentence. 


or 


IV. A DESCRIPTION OF THE EXPERIMENTS 


A. OBJECTIVES 

The experiment was actually a package of three related subexperiments. The first 
one was to examine the recognition of Chinese phonemes. A similar experiment 
examining the recognition of English phonemes was also conducted to obtain a 
comparison reference. The second part was to examine the recognition of a set of 


11 in a simulated speaker independent! mode. More 


ninety Chinese utterances 
information can be found in [Ref. 10]. The third part of the experiment was to 
examine the recognition of the same set of Chinese utterances in speaker dependent! 
mode. 

The objective of the experiment was to determine if Chinese speech could be an 
effective communication medium between human beings and computer systems. Since 
no similar study had been conducted, especially using Chinese phonemes, the 
information obtained would, hopefully, serve as a basis for the further Chinese speech 


recognition/input studies. 


B. SUBJECTS 

The first part of the experiment was conducted by the author himself, because it 
required a thorough understanding of the articulatory phonetics. Ten subjects 
participated in the-remaining parts of the experiment on a volunteer basis. All of the 
subjects were male students from the Republic of China and studying at the Naval 
Postgraduate School. Two civilian students were working on their doctor's degree. 


The remaining eight subjects were naval officers and were working on their master’s 


Tan utterance can be spoken words, phrases or any form of voice that is 
meaningful to the speaker. 


l2a speaker, independent system contains algorithms which supposely can handle. 
many different voices and dialects. The system should be able to recognize the voice of 
anyone who tries to use it. Since it requires no previous samples of a given users 
voice, then, theoretically, we would not expect the speaker independent system to work 
as perfectly as a speaker dependent system. 


13 speaker dependent system requires samples of the potential user’s voice to be 


In memory in order to work properly. Because it is tuned to the users voice, the 
Speaker dependent system should work better than an independent system. 
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degree. The ranks for those officers ranged from Lieutenant Junior Grade to 
Lieutenant inclusive. All subjects were between the ages of 24 and 32 inclusive. Only 
two of the subjects had ever heard about voice recognition before. However, none of 
the subjects had any previous experience on the voice recognition system used in the 


experiment. 


C. EQUIPMENT 

A T600 voice recognition svstem of Threshold Technology Inc. (TTI) was used 
as the recognizer for the experiment. The model T600 is a speaker-dependent, isolated 
utterance recognizer. The recognition unit contained memory which allowed a 
maximum of 256 spoken utterances to be stored. The length of each utterance, 
required by the T600, is between one tenth second and two seconds. A pause of at 
least one tenth second between utterances is also required to signal that the first 
utterance has ended and the second utterance may be coming. Each utterance can be 
associated with a maximum 16-character ASCII string as a recognizer output to a host 
computer system. In this experiment, however, the output string was only displayed 
On the screen of a local terminal for purposes of verifving a correct recognition. 

The system comprises a TTI 8036-3 model main processor unit, a TTI 7020A 
tape cartridge unit. a TTI 8013 speech level contro] unit, an Ann Arbor 400 model 
large-character keyboard/display terminal and a Shure SM-10 noitse-cancelling 


microphone with headset. Please consult [Refs. 11,12] for more information. 


D. VOCABULARY 

The vocabulary used in this experiment was a group of ninety computer related 
terms selected from [Ref: 13]. The first priority for the selection was to cover as many 
sounds as possible. Tables 7 and 8 allow readers to have an idea about which sounds 
Were used in the experiment. The numbers in both tables represent the times that 
particular sound was used. The second priority was to equally distribute a certain 
number of utterances into different length categories. However, this attempt was not 
successfully achieved, because Chinese have an intention to form their terms with two 
or four characters to obtain a sense of symmetry. Hence, the vocabulary came up with 
28 two-character utterances, 16 three-character utterances, 26 four-character 


utterances, 15 five-character utterances and 5 six-character utterances. Appendix C 


a0 


lists all the Chinese utterances in their romanization forms. The numbers following 
each character represent the tone of the character and the last number, enclosed in 
parentheses, is the number of syllables in the utterance. Of course, the last number 
also represents the number of characters in the utterance. Appendix D provides the 
English meaning of those Chinese utterances used and lists them in an alphabetical 


order. 


VAD 7 
I+F SOUND COMBINATIONS USED IN THE EXPERIMENT 


initials 
Ll g 
X 


chhsdrtsh shr dzts 
3 


finals 


oe 
n 


1 x 
l 


> 
rs 


— 
=, 
pet 


hj 
x 
2 
% 
x 
l 
x 
X 
X 
x 
re 


PS PGP PSP OL 


PDD OPS 
CIPS Dorr PSS 
Crt ot mtr OGG 
pet OS PG eet pee od Fd tJ 
rat PPS PS OS OS 


1H eb 
aoe Oe OD 


954 
I 


AM BRM AM MRK GIVI ROW 53 
N— =P mod et OG — OC. 


OS OS 
MMM NM AM MM OM CMU 


x 
X 
x 
af 
x 
X 
xX 
X 
l 
<eoe 
x 
x 
I 
3 
3 
4 


m= bho 
~_— 
— 
— OO 





E. PROCEDURE 

The entire experiment was conducted in the evening or on weekends to avoid any 
possible noise interruption. All subjects were gathered and provided a brief orientation 
on the experiment itself and the procedure of the experiment in advance. Discussions 
were also provided to ensure the subjects sensed the flavor of the experiment. Subjects 
were asked to come, one at a time, to the Man-Machine Interface Lab to conduct the 


experiment. First, the recognition system was input with voice samples of the author, 
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TABLE 8 
I+G+F SOUND COMBINATIONS USED IN THE EXPERIMENT 
initials 
eed se K hj chhsdrtsh shr dzts s 


~s 


MOP OK 
m4 RIOR Re 
MK HO 


~s 


SMAAK Me 
RI Kh er od 8 OS 


bord ed od 
Lad pees rae ek EO OG PG 
tA Glad od od 


x 
Xx 
l 
x 
X 
| 
3 
3 
| 
| 
X 
xX 
Xx 
6 
L 
l 
xX 
X 
x 
5 
l 
2 





which was prerecorded in a training session. The subjects, then, read in each utterance 
three times through a microphone using the author’s reference templates!*. The 
author recorded the outputs shown on the terminal screen. Two out of three or more 
wrong outputs (including no output displayed, in this case the system provided a beep 
sound) for each utterance was considered an incorrect recognition; otherwise, a correct 
recognition. After this simulated speaker independent mode was completed, the 
subjects were instructed to retrain all ninety utterances by introducing individual voice 
samples into the recognition system. When the training was done, the subjects started 
to read in, again, each utterance five times. Since, at this time, a speaker dependent 
mode recognition was conducted, the criterion was escalated. Unless five correct 
outputs in series were recorded at the first trial, the recognition was considered 


incorrect. When some utterances couldn't be correctly recognized at all, a retraining 


es template is the digital representation or matrix.of the utterance which 1s 
stored by the recognizer and used later as a reference to perform recognition. 


3D 


was allowed until a correct recognition was finally obtained. After all results were 
recorded, the experiment was concluded. 
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V. THE RESULTS, DISCUSSIONS AND THE SUGGESTIONS 


A. THE RESULTS OF PHONEME RECOGNITION 

Tables 9 and 10 have respectively provided a copy of record of the recognition of 
English and Chinese phoneme performed by T600 VRS. It is a 21-session experiment 
conducted at most once a day Within a period of two months. The author read every 
phoneme five times during each session and recorded the number of times, out of every 
five trials, that the phoneme was correctly recognized. The author also retrained the 
recognizer for those phonemes that cannot be properly recognized at the end of the 
Ist, 5th, 9th, 13th and 17th session. A further discussion is provided in the following 
paragraphs. 

The information provided in Table 11 is directly derived from Tables 9 and 10. 
The second and fifth columns of the table are the total number out of 105 trials that 
each phoneme was correctly recognized during the entire experiment. The third and 
sixth columns are the averages of each total number over 21 sessions. The percentage 
of each total number is also listed in columns four and eight. From the information 
provided in this table, we may obtain some idea about the recognition of phonemes. 

First, the recognition of vowels 1s better than the recognition of consonants. The 
table is designed in a format that presents single vowels first, diphthongs second and 
consonants the last. A comparison between the upper half and the lower half of the 
table helps to illustrate this findings. Second, among vowels, the recognition of 
diphthongs is better than the recognition of single vowels. The diphthongs include 
</ai/>, /ow/, /oy/ and <ao>. Third, among single vowels, the recognition of Tense 
Vowels is better than the recognition of Lax Vowels. The Tense Vowels include 
6G =e cr ee = el) =, /au/, —~/00/=—, =/oa/=> and /ur/. Fourth, among 
the consonants, the recognition of voiced sounds is better than the recognition of 
voiceless sounds. The voiceless sounds include </p/>, </t/>, </k/>, </f/>. /th/, 
</si>, <hs>, <sh>, /sh/, <ts>, <tsh> and </ch/>. A comparison between 
those sound pairs can help one to appreciate this finding. 

An overall comparison between the phoneme recognition of the two languages 1s 


shown at the end of the Table 11. There were 89.59% of the total trials correctly 
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TABEES 
RECOGNITION OF ENGLISH PHONEMES 


session 
123 45 67 8 9 10111213 1415 1617 18 19 2021 


WINININID 
NINININIY 
NINININIZ 
MINININW) 
NINININU) 
WTA 
WININUIAYW) 
WM TOT 
WINN 
IININIQNY) 
MUU INIA 
WUNINIAU) 
MINTY 
INININ EN) 
MIENININWY 
MUYINTY 
WINN IN 
WMTnwsy 
ININUACNY) 
WIS IAIMW) 
WUAIQUINIA 


wo act 
Vn VUVS 


WINIDINID 
TN 
INININUIAYW) 
TININUAY) 
Inn M—U 
INUIDUIA EN 
MTT 
TMNUIN 
TNwM Ty 
TN 
TINNY 
WUUQUAU 
INIAIQUAU 
TN 
MININUINID 
sTNINENIY 
IINININYW 
TNs 
INTIMA) 
INININTDY 


DININIAIT 


is 


aw 


O 8 
OOON0 & 


WUMWW) 
WYUTCOIL) 
Beefs ee 
WWW) 
WAY) 
MTTYH 
MTN 
MTN 
MWNIT 
WUYAYIL) 
TFenwsyw 
WWT'Y 
WMWYTIYh 
WMWONYY) 
WWYNAYILTY 
WTTY 
WWTIY 
WW) 
(Mey wtry 
WYUYUAYILY 


MMWWw) 


Tema 
soos 


ININIF) 


ITNT 


ININIn 


MWyWw) 
WWwyIry 
WWW) 
Vallala 
WWW) 
WWW> 
INyIry'y) 
yw YT) 
IN ry 
WIM Ww) 
U)ITIL) 
IYI) 
WW) 
ITNT) 
WY) 
PYOYVL) 
WYUAYIL) 


Vall alla 


= = tS. on 
soo Ede 


WWW) 


TIM 
MN 
Tos 


WNunst 


WInst 


MON 


COTO 
STAN 
MUON 
eninst 
WYCNIL) 
cass 
WIMW) 
UOT 
mney 
mMuywy) 


(NUst 


MTT NNW 
MN AUAID 
MNT 


WMNNWYNYV) 


TyTNNsT 


TT TNTY) 
WMMMNN'L) 
FTN 
NOmrWM aD 
WNCACM WNW) 
NNN TINY 
UAT ONAN 
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TIOTTT 
NIMNWIMNY) 
EC ae 
Kon sTaMNeN 
MNATMMIN 
MNWWYYAN 
CNOTNWINN 


IMIMIMIMINIY 


© 13°C 54 OD 


TOENOTANNTY 
MADONNA ST FW 
TINMNTNOTTY 
DDD UII TT NW) 
WDD NUIT 
TNAMINIDININTYY 
WWANWWWYWenytyy 
TAMIONM TTY 
MANNA TTYINANAT 
MONMANYUYAOST 
DINMNWIN TINT N 
OST eeac) 
Tens To HTMNT 
TIEOMN TMNT 
NANT TOINMNW 
ATR NTN? 
NANTON TONCN 
NATNMIMINWYMNW 
ATO TANT TN 
NYA MNNTINNW 
MOWWHTIT NH) 


oO 
wo 
tie SS wa ee OnG 


95554344 


3 


3 


INIA 
WT 
ININW) 
WU ST 
WT 
NTN 
Vallala) 
NIN 
INNINW) 
WT 
INIT 


WMWW) 


INT 


resent the number of times 
d out of five trials. 


Pp 
lv recognize 


@ 


& 


Figures here re 
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TABLE 10 
RECOGNITION OF CHINESE PHONEMES 


session 
[ees 4m Foes 1011 1213 14151617 18 19 2021 


TON ATTN 
ININININWININY) 
WIDININININUY 
DIAAINININID 
NIN AT INUAINW 
DIAINININUIAUY 
TINAMNINWNN 
DIAININININW 
MIDININININU 
UN TT TUN 
TInenninw 
DOIN TT INInin 
TOM TAN 
TNNINININWN 
TOMINININID 
TONMININNW 
Sat enniniyn 
OINIAINININIY 
DMIATINININWy 
Taninininw 


TANININININ 


W tu —ance 
SOO. DOO 


WYWUyr) 
OWT 
MOWT 
MMT) 
WMrnNWyy) 
WITTY) 
WMrNWw) 
TAMNY 
Men) 
WMEnMen 
INennyyw) 
WAW TS 
WMnmyyv) 
TINY 
WMen'ynryt 
MMU) 
eles all @) 
WMenuyy) 
Mmenmyy 
IMentny) 
ye 


Oo Ss 
O008 


WYNT> 
WyYOT 
MINwh) 
MTN 
WUD 
Wen 
TOY 
TY'Y 
CYL) 
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WWF) 
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CUOT 
Gast 
Be a 
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WINWIYOWW 
UNTO T YH 
CNOTTEAONY) 
MAIN) 
TACAIN CNY 
Mays sTny 
WIDIDIAWWL) 
MOAT INMNWY 
MAN nwsTyN 
TAN ATTY 
MATTMNY 
WONT OTN 
MTOM ENY) 
TANINTNY 
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INMINEAT IN 
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CNT Ome 


WIN TINy 
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Figures here represent the number of times 
correctly recognized out of five trials. 
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TABLE 11 
RECOGNITION PERFORMANCE OF PHONEMES 
AVE Chinese 
100 
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TABLE 11 (CONT'D) 
RECOGNITION PERFORMANCE OF PHONEMES 


English TOT Chinese TOT 


94 


89 
25 


102 
prog 





recognized in English phoneme recognition and 88.25% of the total trials in Chinese 
phoneme recognition. There is only a 1.34% of difference between the two languages. 
This finding highly suggests that Chinese speech should also capable of being a bridge 
_ between human beings and computer systems. 

Table 12 is derived from Table 11 by deleting those phonemes that are uniquely 
existing in one language onlv. Therefore, the 28 phonemes, which is 68.29 % of the 
total number, shown in Table 12 are mutually used by the two languages. Since the 
author used the same recognizer to examine the phonemes of the two languages, the 
results of the recognition for each pair of phonemes should be very similar or even the 
same. However, shown.in the table, this is not true. Some significant degradations 
between the pair are observed. The reason for this is, probably, the existance of 
certain sounds having similar characteristics. That is, the more similar the phonemes 
are in the group, the worse the recognition of the phonemes will be. 

During the experiment, the author noticed and recorded some consistent 
substitution errors that occured with certain sounds. A substitution error is when an 
input utterance was calculated as a closer match to a different template in storage and 
caused an incorrect recognition. The information about these substitution errors has 
been put in columns titled ‘SUB’ of the table. Although not all observed degradations 
have recorded a consistent error, the existing information provides a_ possible 


explaination for the degradations. 
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TABLE 12 
RECOGNITION PERFORMANCE OF SHARED PHONEMES 


English Chinese 

TOM “AVEs ses TOT (eA VESeESUG 
Be 105 5.00 105 5.00 
ie 100 4.76 99 4.71 
el 105 5.00 105 5.00 
ea 97 4.62 au 105 5.00 
00 95 4.52 103 4.90 
O 102 4.86 80 3.81 eng 
oa 105 5.00 104 4.95 
a 105 5.00 98 4.67 
u 104 4.95 102 4.86 
e 97 4.62 88 4.19 
al 105 5.00 105 5.00 
m S$] 3.86 78 oan | 
n 99 4.70 99 4.71 
ng 78 3.7 aw 83 3.95 
Pp 89 4.24 83 3.95 

79 3.76 99 4.71 
t 7 3.6% 89 4.24 
d 101 4.81 97 4.62 
k 8] 3.86 89 4.24 
g 80 3.81 oS 4.52 
f ie 3.43 th 87 4.14 
S 76 B02 th 76 3.62 ts 
h 91 4.33 90 4.29 
ch 98 4.67 88 4.19 tsh 
J 99 4.71 85 4.05 dr 
l 93 4.43 94 4.48 
Vy 102 4.86 89 4.24 
Ww 97 4.62 102 4.86 


B. THE RESULTS OF CHINESE UTTERANCE RECOGNITION 

Table 13 provides general information about the performance of the subjects 
during the Chinese utterance recognition experiment. The first column lists the 
number of the subject. The second column provides the number of correct 
recognitions by the recognizer for each subject in the speaker independent mode of the 


experiment. The third column is the percentage of that number against the total 


vocabulary, which is ninety, stored in the recognizer. The fourth column provides the 
number of incorrect recognitions by the recognizer for each subject in the speaker 
dependent mode. The fifth column is the percentage again. The last column lists the 
number of vocabulary items that each subject had to retrain to finally obtain a correct 
recognition in the speaker dependent mode experiment. An overall information is 
provided at the end of Table 13 , which showed that 74.67% of 900 trials of simulated 
speaker independent mode recognition were correctly recognized by the recognizer and 
12.44% of 900 trials of speaker dependent mode recognition, on the first attempt, were 
incorrectly recognized by the recognizer. Only 16 utterances required a retraining to 
eventually obtain a correct recognition. 

The speaker independent mode, as mentioned earlier, is supposed to have a worse 
performance because the contemporary technique cannot fully support the function. A 
correct recognition in this mode is more meaningful, hence, deserves more attention. 
On the other hand, speaker dependent mode is supposed to have a better performance 
because it has been fully supported by the subject’s own voice. Therefore, an incorrect 
recognition in this mode certainly conveys more information and deserves more 
attention. Table 14 lists all ninety utterances used in the experiment. The first and 
fifth columns are the number of each utterance. The second and sixth columns are the 
number of syllables that each utterance has. The third and seventh columns are the 
times of the correct recognition for each utterance. A percentage of the correct 
recognition times against total trials, Which is 10, of each utterance is provided in 
columns four and eight. Table 15 is similar to Table 14. The onlv difference is that 
the third and seventh columns provide the times of the incorrect recognition for each 
utterance. 

Table 16 provides information on the relationship between the syllable numbers 
of each utterance and the recognition performance. The first column lists the numbers 
of the syllables for each Chinese utterance. The second column lists the numbers of 
utterances formed by that certain number of syllables. The third colunnn, 
I(ndependent) and C(orrect), was derived from Table 14 by selecting those with more 
than nine correct recognitions inclusive. The fourth columin is the percentage of the 
correct number against the total number of utterances for certain syllable lengths. The 
fifth column, D(ependent) and I{ncorrect), was derived from Table 15 by selecting 
those with more than two incorrect recognitions inclusive. The sixth column, again, is 


the percentage. 


The author has no intention to make any further statistical analysis because of 
the limited scale of the experiment itself and the data collected. However, the author 
has reached the goal that he set for the experiment. That is, by the support of existing 
information collected, the Chinese speech, accompanied with speaker dependent 
recognition, could be a good communication medium with computer systems. 
Applications of Chinese speech recognition/input are expected in the future. Some 
possible applications in the near future are production line routing, quality control, 
inventory control, package sorting and some military applications such as weapon 


systems control and Combat Information Center operations, etc. 


TABLE 13 
GENERAL PERFORMANCE OF THE SUBJECTS 
Subject Indep “Yo Depen % Retraining 
Correct Incorre Needed 
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C. SUGGESTIONS FOR THE FUTURE 

This thesis has contributed to clarifying some existing confusion in the Chinese 
phonetic system. The main purpose was to establish a solid, error-free basis for 
researchers in their future studies of Chinese voice recognition and voice input. By 
doing so, those researchers will no longer base their studies on a questionable phonetic 
system. 

This study on the phonetic system is also quite future oriented. In speaker 


dependent recognition, the voice samples the system stores are directly obtained from 
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TABLE 14 
CORRECT RECOGNITIONS IN INDEPENDENT MODE 


NO Se NUM % ING SYL NUM % 
Ol 4 9 90 46 5 10 100 
02 4 a 70 47 5 8 80 
03 3 qi 70 48 5 10 100 
04 2 i 70 49 4 - 9 90 
OS 2 5 50 30 5 8 80 
06 4 7 70 51 5 6 60 
Q7 6 10 100 52 4 a 70 
O8 5 6 60 5 5 8 SO 
Q9 3 6 60 54 5 8 SO 
10 2 3 90 0 2 8 80 
11 4 cs 70 56 3 6 60 
12 3 10 100 i 3 5 50 
13 3 2 20 38 2 10 100 
14 3 6 60 59 Z 7 70 
15 x 6 60 60 2 4 40 
16 5 7 70 61 > 5 50 
17 mM 3 30 62 2 10 100 
18 2 3 50 63 3 9 90 
19 2 10 100 64 4 6 60 
20 3 5 0 65 2 6 60 
Zi 6 10 100 66 3 10 100 
22 4 8 80 67 2 4 40 
25 2 6 60 68 3 7 70 
24 5 9 90 69 2 8 80 
25 2 ~ 40 70 4 10 100 
26 2 5 50 71 4 10 . 100 
De 2 9 90 i 2 8 80 
28 ra 9 90 73 4 10 100 
29 6 10 100 74 = 10 100 
30 5 4 40 ffs 5 8 SO 
a 4 10 100 76 4 6 60 
a2 4 2 20 lg. 5 10 100 
33 4 4 40 78 4 yy 70 
34 2 6 60 79 2 8 80 
25 4 4 40 80 2 9 90 
36 2 4 40 81 4 10 100 
37 6 9 90 82 4 10 100 
38 2 10 100 83 5 9 0m 
39 6 10 100 84 3 10 100 
40 = § 80 85 4 6 60) 
4] 5 8 80 86 4 a 70 
4? 2 i 70 87 Z 10 100 
43 4 4 40) 88 3 10 100 
dd 2 6 60 89 3 8 80 
45 3 7 70 90 Z 10 100 


the user himself. Therefore, no matter how the user pronounces his input, so long as it 
matches the way he pronounced it during the training session, the system will correctly 


recognize it. So. a phonetically wrong pronunciation will cause no trouble in a speaker 
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TABLE 15 
INCORRECT RECOGNITIONS IN DEPENDENT MODE 


NO  SYL NUM % NO  SYL NUM % 
01 4 1 10 46 5 1 10 
02 4 3 30 47 5 0 00 
03 3 3 30 48 5 i 10 
04 2 2 20 49 q ( 00 
05 2 10 50 5 0 00 
06 3 30 Si 5 3 30 
07 6 3 30 52 q 0 00 
08 5 3 30 53 5 i 10 
09 3 2 20 54 5 1 10 
10 2 0 00 55 2 ( 00 
11 3 30 56 3 0 00 
12 3 2 20 57 3 2 20 
13 3 1 10 58 2 0 00 
14 3 10 59 2 i 10 
15 q 2 20 60 2 2 20 
16 5 0 00  . 6l p 10 
17 2 0 00 62 2 2 20 
18 5 ? 20 63 3 10 
19 2 4 40 64 4 0 00 
20 3 10 65 2 3 30 
2] 6 1 10 66 3 0 00 
2) 4 10 67 2 10 
23 2 2 20 68 3 0 00 
24 5 I 10 69 2 2 20 
25 2 p 20 70 4 1 10 
26 2 d 40 71 4 1 10 
De 2 2 20 7? 2 4 40 
28 4 0 60 73 d 0 00 
29 6 ( 00 74 4 0 00 
30 3 2 20 75 5 i 10 
31 4 1 10 76 q 0 00 
32 yi 0 00 77 5 5 50 
33 d 0 00 78 q 0 00 
34 2 i 10 79 2 ( 00 
35 4 0 00 80 2 0 00 
36 2 2 20 gi 4 0 00 
37 6 1 10 8) 4 - 90 00 
38 2 1 10 $3 5 0 00 
39 6 1 10 84 3 -0 00 
40 4 4 A0 aS 4 10 
4] 5 0 00 86 i 0 00 
4? 2 3 30 87 p i 10 
43 4 1 10 88 3 1 10 
44 2 7 70 89 3 10 
45 3 0 00 . 90 2 0 00 


dependent recognition system. On the other hand, once the voice recognition 
technique enters the phase of speaker independent mode, this articulation problem will 


be a factor requiring thorough considerations. The further studies relating to this 
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future-oriented problem on both English and Chinese voice recognition are highly 
encouraged to conduct as soon as possible. 

Due to the same reason, two Chinese phonemes, <1> and <a>, will also be a 
potential trouble area when speaker independent recognition is applied in the future. 
Beompacetacem dS an example. [he word secret in Chinese is <mi mi>. 
According to author’s argument, there are, in fact, four possible wavs to pronounce; 
namely, <mee mee>, <mi mi>, <mee mi> and <mi mee>. In dependent mode, 
so long as the user remembers which one is the voice sample he input into the svstem, 
he will have no trouble at all. However, when facing a speaker independent 
recognition system, the situation requires us to answer the questions such as: Can the 
three others than the one in memory be properly recognized? Is vocabulary design a 
possible alternative to solve the problem? Studies to answer these questions are 
certainly needed for the development of a speaker independent voice recognition system 
iene Tear tuture, 

Vocabulary design is also an important factor in the performance of voice 
recognition systems. As described in [Ref. 10: p. 4], Prof. Gary Poock suggested using 
longer vocabulary phases for better reconnidion performance. The results shown in 
Table 16, however, only partially support his statement. This is probably because the 
author, when selecting the utterances, concentrated his efforts on covering more sound 
combinations. Hence, a further study, with more careful vocabulary design strategv, to 
research the relationship between the number of syllables of an utterance and the 


performance of the recognition system will be helpful. 
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A study, also stimulated by Table 16, on the relationship between phonemes and 
the recognition performance will also be appropriate. The experiment results suggested 
that the svilable numbers might not be the only determinant to recognition 
performance. Therefore, this new direction of study might provide an alternative way 
to obtain important information to seek better recognition performance. 

Last but not least, this study was heavily based on Knowledge absorbed from 
articulatory phonetics. The author, in fact, has used the knowledge to help people to 
produce more acceptable American English pronunciation in the past several years. By 
using exact speech organs and articulations, his students did establish a much better 
articulation custom, and, eventually, produce more acceptable pronunciation. Human 
beings can improve their pronouncing skill by the help of articulatory phonetics. Can 
we, then, apply this Knowledge to help a voice recognition system obtain a better 
performance? The study to answer the question, to the author himself, will certainly be 


a very interesting one and deserve his constant devotion in the future. 
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APPENDIX A 
ORIGINAL TABLES USED IN TEXT 


PHONEMES OF AMERICAN SPEECH (ORIGINAL) 


Vowels 


















Front vowels Back vowels 
SYMBOL KEY 
heed fhid] fu] who’d fhud] 
hid = [hrd] {u] hood {hud] 
hayed [hed] [o] hoed [hod] 
head [hed] [>] hawed [hod] 
had = [heed] [a] hod fhad] 
Central vowels Diphthongsf 
[s~3]* hurt [hst] {a1] : file [fax] 
[a] hut hat! [au] fowl = [faul| 
is=3|- under [anda] [or] foil = (foxl] 
[3] about [abaot] {jul fuel = [fjul] 
Consonants 
Stops Fricatives 
tp] pen [pen] [f] few = [fjul] 
[b] Ben [ben] [v] view ([vju] 
(t] ten [ten] [6] thigh [fat] 
(d] .,den [den] (d} thy (dar] 
(k] Kay [ke] (h] hay fhe] 
(g] gay [gel (s] say —[se| 
[tT] chew [t§u] (‘vl | shay [{e] 
{d3] Jew {dsul (z] bays [bez| 
(5 beige [bes] 
Nasals and lateral — Glides 
[m] some [sam] [w] way [wel 
[n] sun  ([sAn| (hw] whey {hwel 
[n| sung [san| i] .yea [jel 
(1] lay [le] (r] ray _— re] 





*(3] and [sg] are the ‘‘r-colored’’ vowels. [3] and [3d] are the pronunciations 
typical of r vowcls in Eastern, Southern, and English speech. 
f Does not include the ‘‘nondistinctive” and centering diphthongs. 
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THE SOUND COMBINATIONS USED IN CHINESE (ORIGINAL-A) 


am : 


Ven 

ma 
IT 3] 
An Lid 


217) Bite AK 


BS 


zB 
7 
zB 


a 
fal 


ri: 


$e Dl GE 


fe? 
Fed aus 15 
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THE SOUND COMBINATIONS USED IN CHINESE (ORIGINAL-B) 


adi fp vai | pad] 


an | Ye | 3S 


x 


ae 


rales 


| 





| 
~- 


t 
J 
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APPENDIX B 
TABLE OF CURRENTLY EXISTING CHINESE ROMANIZATION 
SYSTEMS 


NO ) WE WG Sia PY SG 
01 b Pp b b b 
02 p Pp P P p 
O3 m m m m m 
04 i i f ff i 
O5 d [ d d d 
06 t t t t t 
07 n n n n n 
08 | l | | 

09 : K g g g 
10 K k’ k k k 
11 h h h h h 
20 j ch (1) re 
13 en ae che ch(i) q ch 
{4 SV hs sh(1) X hs 
iS j ch ] zh dr 
16 ch ch’ ch ch " tsh 
17 sh sh sh go sh 
18 ig ] r r r 

ie dz ts,tZ tZ Z dz 
20 ts CStz ee ts Cc ts 
aT S S,$S,SU S S S 
IV a a a a a 
as O O 0 0 O 
24 e eo € e e 
25 e eh g e ea 
26 al al al al al 
27 el el el el el 
28 au ao au ao ao 
BS ou u,ou ou ou oa 


38 


an 


en 


ang 
eng 
ng 
eh 


1,V1 


u,wu 


Ww 


vw 


bez 


an,en 


en 


ang 
eng 
ng 
era 
1,yl 
Vil 


W,u 
yu,u 
vu,u 


uvih 


an 


en 


ang 
eng 
ng 
er 


1,91 
u,wu 
1u,yvu 


1u 


ine 


an 


eg) 


ang 


eng 


er 
1,y1 

V1 
u,wu 
W,u 
yu,u,10 


yu,u, 


an 


en 


ang 
eng 
ng 
er 


vw 
ih 


APPENDIX C 


THE CHINESE UTTERANCES USED IN THE EXPERIMENT 


NO 
Ol 
02 
03 
04 
05 
06 
Q7 
08 
09 
10 
11 
12 
13 


Chinese Romanization 

14 tshang2 droong! drih3 (4) 

chiu3 tshwen2 shih2 jyanl (4) 

hsyan4 iung4 dang3 (3) 

wei4 drih3 (2) 

pei4 drih4 (2) 

ing4 iung4 tsheng2 shih4 (4) 

dzih4 doong4 dzih! lyao4 tshoo3 113 (6) 
foo3 droo4 tshoo2 tshwen? ti3 (5) 
ping2 daid kwan1l (3) 

tvao2 ma3 (2) 

dreng3 pil tshoo3 li3 (4) 

jil drweng3 dvan3 (3) 

erd jind ma3 (3) 

dzihl lyao4d dwan4 (3) 

boo4 lin2 dai4 shoo4 (4) 

chi4d pao4 shih4 pai2 hsiu4 (5) 

neng2 lyang4 (2) 

droongl yangl tshoo3 113 jil (5) 
dzih4 ywan2 (2) 

ma3 drwan3 hwan4 (3) 

dyan4 nao3 foo3 droo4 jyaol hsywea2 (6) 
koong4 drih4d danl ywan2 (4) 

tsih2 droo4 (2) 

dzihl lyao4 koo4 gwan3 113 ywan2 (5) 
drih4 yan2 (2) 

shanl tshoo2 (2) 

shed ji4 (2) 

shoo4d weid hsyan3 shih4 (4) 

tsih2 dvea2 dzwo4 yead hsi4 toongs (6) 
ding4 i4 iud (3) 


34 
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ting2 jil shih2 jyanl (4) 

doong4 tai4 fen! gel (4) 

byanl j12 tsheng2 shih4 (4) 

fang3 drenl (2) 

dang3 an4 droong! dvan3 (4) 

deng1 lood (2) 

004 tshal dren1 tshe4 hsi4 toong3 (6) 
drih2 hsing2 (2) 

wen2 jyan4 tshwan2 dren hsi4 toong3(6) 
shihl 004 1u4 gool (4) 

ke3 hsing2 hsing4 yan2 jyoa4 (5) 
ren4 ti3 (2) 

foo2 dyan3 iung4 shwan4 (4) 

lvoa2 tsheng2 too2 (2) 

goong! neng2 byao3 (3) 

chywan2 mvan4 hsing4 byan4 shood (5) 
too2 hsing2 shood wei4 chi4 (5) 

ban4 shwangl goong!l toong]l daod (5) 
gaol jyeal 1u3 van2 (4) 

gaol shoo4 doo? ka3 j1l (5) 

ing3 hsyang4 tshoo3 113 j1l (5) 

maid tshoongl dza2 inl (4) 

dzeng1 lvang4 byao3 shih4 fa3 (5) 
swo3 ing3 dran4 tswen2 chi4 (5) 
drih3 ling4 (2) 

jyaol tan2 shihd (3) 

tsaol dzoong4 gan3 (3) 

gwangl bi3 (2) 

lvan4 chiun2 (2) 

dzai3 roo4 (2) 

ben3 di4 (2) 

hwei2 loo4d (2) 

bai3 wan4 droal (3) 

he2 bing4 fenl lei4 (4) 


mwo2 dzoo3 (2) 


a5 


66 
67 
68 
69 
70 
al 
i 
73 
74 
is 
76 
77 
78 
79 
80 
81 
82 
83 
84 
85 
86 
87 
88 
89 
90 


hao2 weil myao3 (3) 

kan3 tao4 (2) 

dreng4 gweil hwa4 (3) 

hsywan3 dze2 (2) 

hsiun4 hsi2 baol feng! (4) 

tsa2 hsiun?2 314 chyao3 (4) 

bao3 hood (2) 

mai4 tshoong! shwail yyan3 (4) 
da3 koong3 ka3 pyan4 (4) 

lwan4 shoo4 tshan3 sheng! chi4 (5) 
fan4 wei2 he2 dweid (4) 

bvan4 shih4 jing! chywea4 dooé4 (5) 
tsanl kao3 lyea4 byao3 (4) 

fan3 she4 shao3 myao?2 (2) 

jiu4d tshih4 (2) 

sih4 foo2 jil goad (4) 


' kwai4 drao4 kao3 beid (4) 


load tsha2 tswod 004 liud (5) 
yoa3 hsyao4d hsing4 (3) 

tshwei2 drh2 kwei4 gei3 (4) | 
dzwei4d hwai4 drwang4 kwang4 (4) 
we12 hsyea3 (2) | 
ling2 chi2 byaol (3) 

ling2 14 drih4 (3) 

chiul 1u4 (2) 


APPENDIX D 


THE ENGLISH EQUIVALENT USED IN THE EXPERIMENT 


NO 
O1 
02 
03 
04 
05 
06 
Q7 
OS 
O9 
10 
ll 
[2 
13 
14 
I 
16 
17 
18 
1 
20 


English Vocabulary 
abort (2) 

access time (3) 

active file (3) 

address (2) 

allocation (4) 
application program (6) 
automatic data processing (9) 
auxiliary storage (7) 
bandwidth (2) 

bar code (2) 

batch processing (4) 
benchmark (2) 


- binary code (4) 


block (1) 

boolean algebra (5) 
bubble sort (3) 

capacity (4) 

central process unit (6) 
character (3) 

code conversion (4) 
computer aided instruction (8) 
control unit (4) 

cylinder (3) 

database administrator (8) 
delay (2) 

delete (2) 

design (2) 

digital display (5) 

disk operating system (7) 


domain (2) 


on 
a2 
30 
34 
35 
36 
a 
38 
39 
40 
41 
42 
43 
-4 
45 
46 
47 
48 
49 
50 
SI 
52 
53 
54 
55 
56 
a7 
58 
59 
60 
61 
62 
63 
64 
65 


downtime (2) 

dynamic partitioning (7) 
editor (3) 

emulation (4) 

end of file (3) 

enthy 2) 

error detection system (7) 
execution (4) 

facsimile document system (9) 
failure prediction (5) 
feasibility study (7) 
firmware (2) 
floating-point operation (8) 
flowchart (2) 

function table (4) 

global variable (6) 
graphic digitizer (6) 
half-duplex channel (5) 
high level language (5) 
high speed reader (4) 
Image processor (5) 
impulse noise (3) 
incremental representation (9) 
index register (5) 
instruction (3) 

interactive (4) 

joystick (2) 

light pen (2) 

link group (2) 

load (1) 

local (2) 

loop (1) 

megacvcle (4) 

merge-sort (2) 

module (2) 


| 


66 
67 
68 
69 
70 
71 
ie 
iS 
74 
ie 
76 
ey 
78 
ie 
80 
Sl 
82 
83 
84 


86 
87 
83 
89 
90 


nanosecond (4) 

nest (1) 

normalize (3) 

option (2) 

packet (2) 

polling technique (4) 
protection (3) 

pulse decay (3) 

punch card (2) 

random number generator (8) 
range check (2) 
recognition accuracy (8) 
reference listing (5) 
reflective scan (4) 
rejection (3) 
servomechanism (6) 
snapshot copy (4) 
undetected error rate (7) 
validity (4) 

vertical feed (4) 
worst-case (2) 

write only (3) 

zero flag (3) 

Zero suppression (5) 


zone (1) 


>? 


10. 


be 


re 
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